Machine Learning is a field of computer science that gives the computers the ability to learn without being explicitly programmed. - Arthur Samuel (1959)
Machine Learning is an approach or subset of AI, with emphasis on "learning" rather than just computer programming. Here a machine uses complex algorithms to analyse a massive amounts of data, recognize the patterns among the data, and make a prediction - without requiring a person to program specific instructions into the machine's software. The systems pattern recognition improves over time as it learns from it's mistakes and corrects itself, just like a human would.
Machine Learning models are able to learn from the data without the explicit help of a human. That is the main difference between machine learning models and classical algorithms. Classical algorithms are told how to find the best answer in a complex system and the algorithm then searches for best solutions and often works faster and more efficiently than a human. However the bottleneck here is that the humans has to first come up with the best solution. In machine learning, the model is not told the best solution and instead is given several examples of the problem and is told, figure out the best solution.fair
Unlike hard-coding a software program with specific instructions to complete a task, Machine Learning allows a system to learn to recognize patterns on its own and make predictions.
Artificial Intelligence is intelligence displayed by machines in contrast with the natural intelligence displayed by humans.
Artificial intelligence is a broader concept than machine learning, which addresses the use of computers to mimic the cognitive functions of humans.
When machines carry out tasks based on algorithms in an “intelligent” manner, that is AI. Machine learning is a subset of AI and focuses on the ability of machines to receive a set of data and learn for themselves, changing algorithms as they learn more about the information they are processing.
You can think of deep learning, machine learning and artificial intelligence as a set of Russian dolls nested within each other, beginning with the smallest and working out. Deep learning is a subset of machine learning, which is a subset of AI.
In short, machine learning and deep learning are categorized under AI, but AI is n't necessarily machine learning or deep learning.
Deep Learning a subset of Machine Learning takes computer intelligence even further. It uses massive amounts of data and computing power to simulate Deep Neural Networks. Essentially, these networks imitate the human brain's connectivity, classifying data sets and finding correlations between them. With its new found knowledge which is acquired without human intervention the machine can then apply its insights to other data sets. The more data the machine has at its disposal, the more accurate its predictions will be.
Deep Learning can be expensive and requires massive datasets to train iteself on. That's because there are huge number of parameters that need to be understood by a learning algorithm which can initially produce lot of false-positives. For instance, a deep learning algorithm could be instrucuted to "learn" what a cat looks like. It would take massive data set of images for it to understand very minor details that distinguish a cat from, say a cheetha or a panther or a fox.
Deep Learning was inspired by the structure and function of the brain, namely the interconnection of many neurons. Artificial Neural Networks (ANN) are algorithms that mimic the biological structure of the brain. In ANNs, there are "neurons" which have discrete layers and connections to other "neurons". Each layer picks out a specific feature to learn such as curves / edges in image recognition. It's this layering that gives deep learning its name, depth is created by using multiple layers as opposed to a single layer.
So what contributed the emergence of Machine Learning? Basically there were two factors:
An algorithm is a set of rules to be followed when solving problems. In machine learning, algorithms take in data and perform calculations to find an answer. The calculations can be very simple or they can be more on the complex side.
Algorithms need to be trained to learn how to classify and process information. The efficiency and accuracy of the algorithm are dependent on how well the algorithm was trained. Using an algorithm to calculate something does not automatically mean machine learning or AI was being used. All squares are rectangles, but not all rectangles are squares.
Unfortunately, today, we often see the machine learning and AI buzzwords being thrown around to indicate that an algorithm was used to analyze data and make a prediction. Using an algorithm to predict an outcome of an event is not machine learning. Using the outcome of your prediction to improve future predictions is.
There are three main types of Machine Learning.
Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known.
Simply put, supervised learning finds associations betweeen features of a dataset and a target variable. For example, supervised learning models might try to find the association between a person's health features (heart rate, obesity level and so on) and that person's risk of having a heart attack(the target variable).
These associations allow supervised models to make predictions based on past examples. This is often the first thing that comes to people's minds when they hear the phrase, machine learning, but it is in no way does it emcompass the realm of machine learning. Supervised Machine Learning models are often called predictive analytics models, named for their ability to predict the future based on the past.
Supervised Learning requires a certain type of data called labeled data. This means that we must teach our model by giving it historical examples that are labeled with the correct answer.
Specifically, supervised learning works using parts of the data to predict another part. First we must separate data into two parts as follows:
Supervised Learning attempts to find a relationship between the predictors and the response in order to make a prediction. The idea is that in the future a data observation will present itself and we will only know the predictors. The model will then have to use the predictors to make an accurate prediction of the response value.
Most experts estimate that approximately 70 percent of machine learning is supervised learning.
Suppose we wish to predict if someone will have a heart attack within a year. To predict this, we are given that person's cholestrol, blood pressure, height, their smoking habits and perhaps more. From this data, we must ascertain the likelihood of a heart attack. Suppose, to make this prediction, we look at the previous patients and their medical history. As these are previous patients we know not only their predictors(cholestrol, blood pressure, and so on), but we also know if they actually had a heart attack (because it already happened!).
This is a supervised machine learning problem because we are:
The hope here is that a patient will walk in tomorrow and our model will be able to identify whether or not the patient is at risk for a heart attack based on her / his conditions (just like a doctor would).
As the model sees more and more labeled data, it adjusts itself in order to match the correct labels given to us. We can use different metrics to pinpoint exactly how well our supervised machine learning model is doing and how it can better adjust itself.
One of the biggest drawbacks of supervised machine learning is that we need this labeled data which can be very difficult to get hold of. Suppose we wish to predict heart attacks, we might need thousands of patients along with all of their filled in medical information and years worth of follow-up records for each person, which could be a nightmare to obtain.
In short, supervised models use historical labeled data in order to make predictions about the future. Some possible application for supervised learning include:
Supervised learning exploits the relationship between the predictors and response to make predictions, but sometimes it is enough just knowing that there even is a relationship. Suppose we are using a supervised learning model to predict whether or not a customer will purchase a given item. A possible dataset might look like this:
Person ID | Age | Gender | Employed | Bought the Product? |
---|---|---|---|---|
1 | 63 | F | N | Y |
2 | 24 | M | Y | N |
Note that in this case the predictors are Age,Gender and Employed while our response is "Bought the product?". This is because we want to see if, given someone's age,gender and employment status they will buy the product.
Assume that a model is trained on this data and can make accurate predictions about whether or not someone will buy something. That, in and of itself, is exciting but there's something else that is arguably even more exciting. The fact that we could make accurate predictions implies that there is a relationship between these variables, which means that to know if someone will buy your product, you only need to know their age, gender and employment status. This might contradict the previous market research indicating that much more must be known about a potential customer to make such a prediction.
This speaks to supervised learning's ability to understand which predictors affect the response and how. For example, are women most likely to buy the product, which age groups are prone to decline the product, is there a combination of age and gender that is a better predictor than any one column on its own. As someone's age increases, do their chances of buying the product go up,down or stay the same?
It is also possible that all the columns are not necessary. A possible output of a machine learning might suggest that only certain columns are necessary to make the prediction and that the other columns are only noise and they do not correlate to the response and therefore confuse the model.
There are two types of supervised learning models: regression and classification. The difference between the two is quite simple and lies in the response variable.
Regression models attempt to predict a continuous response. This means that the response can take on a range of infinite values. Consider the following examples:
Classification attempts to predict a categorical response, which means that the response only has a finite amount of choices. Examples include the ones given as follows:
The following graphs show a relationship between three categorical variables (age, year they were born and education level) and a person's wage:
Note that even though the predictor is categorical, this example is regressive because of the y-axis, our dependent variable, our response, is continuous.
Our earlier heart attack example is classification because the response was "will this person have a heart attack within a year?", which has only two possible answers: Yes or No.
Sometimes it can be tricky to decide whether or not you should use classification or regression. Consider that we are interested in the weather outside. We could ask the question, how hot is it outside? in which case your answer is on a continuous scale and some possible answers are 79 degrees or 98 degrees. However as an exercise if we go and ask 10 people what the temperature is outside most of them will not answer in some exact degrees but will bucket their answer and say something like it's in the 60s.
We might consider this as a classification problem, where the response variable is no longer in exact degrees but is in a bucket. There would only be a finite number of buckets in theory, making the model perhaps learn the differences between 60s and 70s a bit better.
Unsupervised learning does not deal with predictions but has a much more open objective. Unsupervised learning takes in a set of predictors and utilizes relationships between the predictors in order to accomplish tasks, such as the following:
The first element on this list is called the dimension reduction and the second is called clustering. Both of these are examples of unsupervised learning because they do not attempt to find a relationship between predictors and a specific response and therefore are not used to make predictions of any kind. Unsupervised models, instead are utilized to find organizations and representations of the data that were previously unknown.
A big advantage for unsupervised learning is that it does not require labeled data, which means that it is much easier to get data that complies with unsupervised learning models. Of course, a drawback to this is that we lose all predictive power because the response variable holds the information to make predictions and without it our model will be hopeless in making any sort of predictions.
A big drawback is that it is difficult to see how well we are doing. In a regression or classification problem, we can easily tell how well our models are predicting by comparing our models answers to the actual answers. For example, if our supervised model predicts rain and it is sunny outside, the model was incorrect. If our supervised model predicts the price will go up by 1 dollar and it goes up by 99 cents, our model was very close! In supervised modeling, this concept is foreign because we have no answer to compare our models to. Unsupervised models are merely suggesting differences and similarities which then require a human's interpretation.
Popular techniques include self-organising maps, nearest-neighbour mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.
About 10 to 20 percent of machine learning is unsupervised learning, although this area is growing rapidly.
With reinforcement learning, the algorithm discovers for itself which actions yield the greatest rewards through trial and error. The algorithm then adjusts itself and modifies its strategy in order to accomplish some goal, which is usually to get more awards.
Reinforcement learning has three primary components:
The objective is for the agent to choose actions that maximize the expected reward over a given period of time. The agent will reach the goal much quicker by following a good policy, so the goal in reinforcement learning is to learn the best policy.
This type of machine learning is very popular in AI-assisted game play as agents (the AI) are allowed to explore a virtual world and collect rewards and learn the best navigation techniques. This model is also popular in robotics especially in the field of self-automated machinery, including cars:
It can be thought that reinforcement is similar to supervised learning in that the agent is learning from its past actions to make better moves in the future; however, the main difference lies in the reward. The reward does not have to be tied in any way to a "correct" or "incorrect" decision. The reward simply encourages or discourages different actions.
Markov decision processes (MDPs) are popular models used in reinforcement learning.
Reinforcement learning is often used for robotics and navigation.
Each of the three types of machine learning has its benefits and also its drawbacks as listed:
This exploits relationships between predictors and response variables to make predictions of future data observations.
This finds similarties and differences between data points
This is reward-based learning that encourages agents to take particular actions in their environments.
There are many caveats of machine learning. Many are specific to different models being implemented, but there are some assumptions that are universal for any machine learning model, as follows:
In [ ]: